feat : Add Support of Qwen2.5Omni Model and MiniCPM-o-4_5 Model by KKkai0315 · Pull Request #612 · UbiquitousLearning/mllm

KKkai0315 · 2026-01-23T07:51:41Z

Summary by CodeRabbit

New Features
- Full multimodal Qwen2.5 Omni: text, vision, audio support with tokenizer, audio preprocessing, comprehensive configs, and interactive example CLIs.
- MiniCPM‑o4.5: multimodal model with TTS and end-to-end token→wav synthesis, prompt cache tools, tokenizers, and example CLIs.
Backends & Layers
- CPU backend adds ConvTranspose1D and Tanh ops with corresponding NN layer support.
Tools
- New scripts to convert token2wav formats and export prompt caches.
Tests
- Kernel/unit tests for ConvTranspose1D and Tanh.